- Polynomial regression
- Step functions
- Regression splines
- Smoothing splines
- Generalized additive models
10/27/2019
A simple approach for incorporating non-linear associations in a linear model is to include transformed versions of the predictors in the model, e.g.
\[\text{mpg} = \beta_0 + \beta_1 \text{horsepower} + \beta_2 \text{horsepower}^2 + \varepsilon\]
We are predicting mpg using a non-linear function of horsepower, but it is still a linear model with \(X_1 = \text{horsepower}\) and \(X_2 = \text{horsepower}^2\).
\[y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \dots + \beta_d x_i^d + \varepsilon_i\]
y ~ poly(x, degree = d)
in formulaCan use predict(fit, newdata = ..., se = T)
and then use the 2 standard deviation rule.
Caveat: polynomials have notorious tail behavior - very bad for extrapolation. This is due to the fact that the polynomial function is defined globally (fitted using all of the data).
Another way of creating transformations of a variable: cut the variable into distinct regions.
In order to fit a step function we use the cut()
function.
Given cutpoints \(c_1, c_2, c_3\) in the range of \(X\), we construct \(4\) dummy variables
\[ \begin{aligned} &C_1(X) = \mathbb{I}(X \leq 92), \\ &C_2(X) = \mathbb{I}(92 < X \leq 138) \\ &C_3(X) = \mathbb{I}(138 < X \leq 184) \\ &C_4(X) = \mathbb{I}(X > 184) \end{aligned}\]
We only use \(3\) of them (one is the baseline). In general, given \(K\) cutpoints there are \(K+1\) intervals and \(K\) dummy variables.
Let’s take the best of the two previous ideas:
smoothness and flexibility, from polynomial regression
local support, from step function approach
Instead of a single polynomial in \(X\) over its whole domain, we can rather use different polynomials in regions defined by knots, e.g. \[y_i = \begin{cases} \beta_{01} + \beta_{11} x_i + \beta_{21} x_i^2 + \beta_{31} x_i^3 + \varepsilon_i, \quad &\text{if } x_i \leq c \\ \beta_{02} + \beta_{12} x_i + \beta_{22} x_i^2 + \beta_{32} x_i^3 + \varepsilon_i, \quad &\text{if } x_i > c \end{cases}\]
Better to add constraints to the polynomials, e.g. continuity
Splines have the “maximum” amount of continuity
Two distinct \(3^{rd}\) degree polynomials. How many degrees of freedom (parameters)?
Two distinct \(3^{rd}\) degree polynomials with continuity. How many degrees of freedom (parameters)?
How many degrees of freedom (parameters)? Splines impose the continuity of all derivatives!
A linear spline with knots at \(\xi_k, k = 1,2,\dots, K\) is a piecewise linear polynomial continuous at each knot. We can represent this model as \[y_i = \beta_0 + \beta_1 b_1(x_i)+ \beta_2 b_2(x_i)+ \dots + \beta_{K+1} b_{K+1}(x_i) + \varepsilon_i\]
where the \(b_k\) are basis functions (truncated power basis) \[\begin{aligned} &b_1(x_i) = x_i \\ &b_{k+1}(x_i)=(x_i - \xi_k)_{+}, \quad k = 1,2,\dots,K \end{aligned}\] Here the \((\dots)_{+}\) means positive part, i.e.
\[(x_i - \xi_k)_{+} = \begin{cases} x_i - \xi_k & \text{if } x_i > \xi_k \\ 0 & \text{otherwise} \end{cases}\]
A cubic spline with knots at \(\xi_k, k = 1,2,\dots, K\) is a piecewise cubic polynomial with continuous derivatives up to order 2 at each knot. We can represent this model as \[y_i = \beta_0 + \beta_1 b_1(x_i)+ \beta_2 b_2(x_i)+ \dots + \beta_{K+3} b_{K+3}(x_i) + \varepsilon_i\]
where the \(b_k\) are basis functions \[\begin{aligned} &b_1(x_i) = x_i \\ &b_2(x_i) = x_i^2 \\ &b_3(x_i) = x_i^3 \\ &b_{k+3}(x_i)=(x_i - \xi_k)^{3}_{+}, \quad k = 1,2,\dots,K \end{aligned}\]
\[\hat{y}_i = \beta_0 + \beta_1 b_1(x_i)+ \beta_2 b_2(x_i)+ \dots + \beta_{K+3} b_{K+3}(x_i)\]
In R
, we can simply fit a “linear” model on the spline basis!
fit <- lm(Y2 ~ bs(X, df = 10, degree = 3))
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x_i)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x_i)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x_i)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]
\[\hat{f}(x) = \hat{\beta}_0 + \hat{\beta}_1 b_1(x)+ \hat{\beta}_2 b_2(x_i)+ \dots + \hat{\beta}_{K+3} b_{K+3}(x)\]